LANGUAGE TRANSLATION SYSTEM AND METHOD 



REFERENCE TO RELATED APPLICATIONS 

This application claims the benefit of U.S. provisional patent application serial 
number 60/428,547, filed November 22, 2002 and entitled "Language Translation 
System And Method", the disclosure of which is incorporated herein by reference. 

TECHNICAL FIELD 

The present invention relates to multilingual communications over a computer 
network, and more particularly, to a system and method for improved language 
translation and delivery of textual portions of commxmications sent over a computer 
network. 

BACKGROUND OF THE INVENTION 

Language translation is the transfer of the meaning of a text from one 
language to another for readership. Language translation methods have evolved over 
the years and vary from traditional human translation to machine translation to 
machine translation with a human translation component. Various pre- and post- 
translation editing techniques have also been employed to increase the accuracy of 
translated text. Human translators use a variety of thought processes, skills and 
resources to interpret the meaning of a sentence and commimicate the meaning of that 
sentence in a different language. They are expert at the proper grammar, idiomatic 
turn of phrase, and specialty vocabulary areas, which ensures a translation that will be 
clearly understood in the target language. Understandably, the automation of this 
human process has proven to be challenging and costly, and to date the publication of 
translated documents often requires the involvement of a human translator acting as 
an editor. 

With the advent of networked computers and the Intemet, and the resulting 
cheap, instant global messaging, information retrieval, and file transfer capabilities, 
the need for improved, automated, and highly accurate translation capabilities is 
greater than ever. While human translation is unquestionably the preferred method 
for producing accurate and idiomatic translations, it remains prohibitively expensive 



and too time consuming to meet the new demands of businesses and individuals 
working at Internet speeds. Today, multinational corporations are communicating 
with their intemational offices and partners on a daily basis. In order for 
organizations to continue to maintain a competitive edge, personnel must have the 
5 ability to collaborate with colleagues around the globe. Successful partnerships with 
intemational colleagues require that personnel have access to immediate translations 
of foreign-language documents, intranet content, and cross-language communications 
via workgroups and e-mail. 

10 Some Internet web sites allow a user to obtain a translation of a web page 

from one language into another, or allow the translation of a given textual matter from 
one language into another. Web sites such as www . altavis ta . com and its Babelfish™ 
program, for example, provide Internet access to machine translation tools which can 
translate text using one of the many methods of machine translation commonly 

15 known. Other systems, such as LanguageLine™ Services from AT&T^^ provide fast 
voice translation services to assist with language translation needs via telephone. 
Unfortimately, such systems and/or web sites do not provide consistently accurate or 
context-related translations and are therefore not suitable for quickly and effectively 
translating broad ranges of communications. 

20 

Search engines are generally equally poor at translation. Search engines are 
not known to maintain databases in more than one language. If a user inputs 
keywords in the English language, the search engine will only search for web pages 
containing the English keywords. Therefore it is not likely that the search engine will 

25 discover web pages which contain the French translation of the input keywords, for 
example. Accordingly, in this example, although a web page drafted in the French 
language may be highly relevant to the English keywords and of particular interest to 
the user, the search engine is unlikely to detect the French web page. In addition, 
current search engines typically first return to the user abstracts or small portions of 

30 text from the web pages discovered during the search. If a web page happens to be in 
a foreign language, the abstract or text will be presented to the user in that foreign 
language. Accordingly, the user will not be able to understand the search results 
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without retrieving the web pages and then translating the text. The quaUty of the 
search result can thereby suffer. 

The language translation challenge is also significant in the context of e-mail 
5 and chat messages. Oftentimes, a user will desire to send a message to another party 
who is not fluent in the user's native language. Accordingly, the user will have to 
create the message in the native language, initiate some process for translating the 
message into the foreign language and then send the message to the other party. 
While software programs and Intemet web sites exist for translating text fi-om one 

10 language to another, such processes are burdensome to the user. The user's e-mail or 
chat applications must either be modified to include or configured to interface with 
translation software. The user is also required to take affirmative steps to ensure that 
the translation is performed prior to sending the message. This influences user 
interaction whether the message is in e-mail, instant message, short message service 

15 (SMS) or other format. Translation of SMS messages is particularly challenging 
given the myriad devices, operating systems, and networks involved in SMS 
messaging. 

The present invention focuses on the development and improvement of 
20 machine translation efficiency, quality and accuracy. 

It is thus one object of the present invention to provide a system for automatic 
translation of user defined communications in a computer network. 

25 It is another object of the invention to provide improved language translation 

services to Intemet users and remote device users while not requiring substantial 
modifications to the user's existing hardware or software. 

It is another object of the present invention to provide highly accurate 
30 translations of textual communications through automated dictionary selection and 
deployment. 
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It is a further object of the present invention to provide a quick, efficient 
method for machine translation over a computer network whereby dictionaries can be 
continuously augmented and adjusted for more accurate communications. 

5 It is yet another object of the present invention to provide a method and 

system for machine translation over a computer network which allows users to 
communicate in different languages in real-time using specialized dictionaries. 

It is still another object of the present invention to provide a comprehensive, 
10 easy-to-access datab£ise of specialized dictionaries. 

It is another object of the present invention to provide a system for performing 
machine translation for different source languages, target languages, and 
sublanguages, and automatically sending the translated text via telecommunications 
15 links to one or more recipients in different languages and/or in different locations. 

It is still another object of the present invention to provide a system and 
method for enhanced levels of translation accuracy based on context recognition and 
sub-language dictionary application. 

20 

It is yet another object of the present invention to provide a system and 
method for text translation which is capable of being upgraded easily through 
subsequent dictionary inputs from users. 

25 It is yet another object of the present invention to provide a system and 

method for accurate, real-time translation of various text messages, including SMS 
messages. 

DISCLOSURE OF THE INVENTION 

30 

By the present invention, there is thus provided a system for translation of 
electronic communications that automatically selects and deploys specialized 
dictionaries based upon context recognition and other factors. The system includes a 
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machine translation component which can access a database of speciaUzed 
dictionaries and can also deploy search agents to search the Internet for 
complementary specialized translation dictionaries. Software tools can be employed 
to allow each dictionary to be modified, augmented, and supplemented to become 
5 more complete and accurate for a given contextually sensitive translation. The system 
and method of the invention can be used to translate electronic mail, instant messages, 
chat, SMS, electronic text and word processing files, Internet web pages, Internet 
search results, and other textual communications. The system can accept a wide 
variety of inputs converted to text, including facsimiles and speech inputs, and can 
10 translate based upon specialized sub-dictionaries, including user-specific dictionaries. 
In one aspect, a network of readily accessible dictionaries is provided whereby 
dictionary owners can be compensated for the use of their specialized dictionaries. 

The present invention assists in both the assimilation of translated foreign- 
15 language information for one's own purposes, and the dissemination of translated 

native-language information for receipt by a foreign language individual. The present 
invention can employ comprehensive dictionaries and a collection of linguistic rules 
that translate one language into another without relying on human translators. The 
present invention can interpret the structure of sentences in the source language (the 
20 language the user is translating fi-om) and generate a translation based on the rales of 
the target language (the language the user is translating to). The process involves 
breaking down complex and varying sentence stractures, identifying parts of speech, 
resolving ambiguities, and synthesizing the information into the components and 
stracture of the new language. 

25 

In one embodiment, the present invention combines machine translation with 
other communication and knowledge management tools in order to create the ability, 
in real-time, over a network to (1) convert Speech-to-text (STT) with the highest 
accuracy level and speed possible; (2) port the STT output to an open-architecture 
30 machine translation system using a larger range of both language-specific and 

context-specific lexicons; (3) identify changes in dynamic content for a real-time 
dictionary selection; and (4) represent the output as synthesized speech on any type of 
communication device. 
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The translation communication services of the present invention provide 
translation to standard services such as email, faxes and voicemail services over the 
Internet. For example, senders could write a fax or email in their native language, 
5 automatically translate it and do post editing before sending. 

According to one aspect, the present invention includes an SMS message 
routing component that transmits and receives short message service (SMS) data 
packets via a communications network. The routing component includes an SMS 

10 message translation database that contains information used to determine the 

translation for a received SMS message. The message translation database includes 
data used to identify a sending and/or receiving party attribute of an SMS message, as 
well as translation processing instructions. Such translation processing instructions 
can include context-specific translation instructions. In one aspect, the present 

1 5 invention can provide an SMS translation component readily accessible regardless of 
device type, network operator or device operating system. 

BRIEF DESCRIPTION OF THE DRAWINGS 

20 FIG. 1 is a functional block diagram of one environment in which the present 

invention may provide multilingual service capability across a computer network. 

FIG. 2 is a block diagram of one aspect of the present invention, showing how 
a speech input can be converted to text, translated, converted back to speech and 
25 outputted using the translation system of the present invention. 

FIG. 3 shows an example of a portion of a dictionary database architecture for 
use in connection with the system and method of the present invention. 

30 FIG. 4 is an exemplary user interface which may be presented to an end user 

device in accordance with the present invention. 
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FIG. 5 is a state diagram illustrating a progression of tasks performed during 
the sending of an electronic mail communication in accordance with one embodiment 
of the present invention. 

5 FIG. 6 is a state diagram illustrating a progression of tasks performed during a 

multilingual search transaction in accordance with one embodiment of the present 
invention. 

FIG. 7 is a state diagram illustrating a progression of tasks performed during a 
1 0 dictionary search routine and resulting business processes in accordance with one 
embodiment of the present invention. 

FIG. 8 is a diagram of a short message service (SMS) network for use in 
accordance with one aspect of the present invention. 

15 

FIG. 9 is a sample schematic showing one environment in which the system of 
the present invention may be employed. 



20 DETAILED DESCMPTION OF A PREFERRED EMBODIMENT 



The present invention is directed to a system and method for performing 
language translation functions for communications over a computer network. As 
shown in Fig. 1, by the present invention there is provided a translation system 10 

25 having a translation gateway 12 for receiving and translating communications sent 
over a computer network 14. In one embodiment of the invention, the computer 
network can be the Intemet. The gateway 12 may be functionally separated into an 
interface server 16 and a translation server 18. The interface server 16 has the ability 
to receive a commxmication having a textual portion authored in a first language, such 

30 as may be transmitted from an end user device 20 such as a standard personal 

computer adapted with hardware and software to communicate over the network. 
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User 20 can be connected to the computer network by conventional means, 
such as a modem or direct connection through a local area network, wide area 
network, or other similar means. While connected, the end user device 20 and the 
interface server 16 can commimicate via the Intemet using standard communication 
5 protocols. In certain embodiments, the interface 16 may ftmction in an OEM, or 
back-end, configuration, such as when an end user device 20 or a remote server 
comprises a search engine front-end. In such configurations, a custom 
communication protocol may be employed. 

The end user devices 20 can also be equipped with application software that 
allows a user to interact with services offered over the network 14. For instance, the 
end user devices 20 may include standard browser software for receiving web pages 
over the Intemet and for interpreting documents created in HTML. Also, the end user 
devices 20 may include other application software, such as electronic mail ("e-mail") 
applications. File Transfer Protocol ("FTP") applications and other file transfer 
applications, chat room applications, newsgroup applications, instant messaging 
appUcations, short message service (SMS) applications and the like, to interact with 
other services offered over the Intemet or other network. Alternately, one or more of 
the end user devices 20 may be other search engines operating in cooperation with the 
interface server 16. For instance, the end user devices 20 may be search engine fi-ont- 
ends provided by other service providers and which pass information between an 
actual end user and the interface server 16. 

The interface server 1 6 in accordance with one embodiment of the present 
25 invention includes the capability to provide the user with a seamless interface to 
resources of the Intemet, which may happen to exist in many languages. In other 
words, the interface server 16, acting in conjunction with a translation server 18 
includes the ability to translate information received fi-om the user fi-om a first 
language to a second language, and to translate information destined to the user fi-om 
30 a second language to the first language. In addition, the interface server 16 and the 
translation server 18 provide translation services with minimum deviation fi-om 
traditional methods of interfacing with Intemet resources. In one aspect of the present 
invention, the interface server 16 is accessible via remote devices sending and 



10 
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receiving text and short message service (SMS) messages. The function of the 
interface server 16 and the translation server 18 are discussed in greater detail below. 

The interface server 16 can forward a communication, or portions thereof, to 
5 the translation server 18. The translation server 18 translates the textual portion of the 
communication to another language. In one embodiment, the translation server 18 
converts each word from the native language to the language identified as the target 
language, using syntactic and semantic analysis algorithms as known in the art. The 
interface server 16 then receives the translated textual portion from the translation 

10 server 18, constructs the translated communication, if necessary, based on the 

translated textual portion, and finishes processing the communication in the manner 
desired by the user 20. The communication may represent an e-mail message, a chat 
message, a keyword search request, a web-page (e.g., an HTML file), an SMS 
message, a URL, or any other transmission of data from one network node to another 

1 5 network node. Accordingly, the gateway may be responsible for translating and 

routing e-mail messages, SMS messages, chat messages, keywords and/or database 
queries, URLs, abstracts and other information pertaining to web pages, message 
commimications, and other types of data files. 

20 In one embodiment of the invention, the interface server 16 may include 

search engine fimctionality for transmitting database search queries to a search engine 
database 25. The Internet may connect the interface server 16 to the search engine 
database 25. Altematively, a direct network connection may connect the interface 
server 16 to the search engine database 25. One example of the search engine 

25 database 25 is the Intemet database maintained by the Inktomi Corporation, which is 
well known in the art. The search engine database 25 may include information 
referring to many hundreds of thousands, even millions, of web pages published on 
the Intemet. Within the search engine database 25, information may associate a 
location of a data file with multiple keywords describing the content of the web page. 

30 The keywords stored in the search engine database 25 may be extracted from words 
present within each web page, such as text within the web page or text stored in 
"meta-tags" within the web page. As is well known in the art, meta-tags are portions 
of a web page which are not visible to a user, but which can contain text describing 
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the web page. Generally, keywords are only stored within the search engine database 
25 in one language - the native language of the web page. Consequently, keywords 
are only searchable in the search engine database 25 in the native language of the 
keywords. Thus, if keywords happen to be in the French or German language, the 
5 search engine database 25 should be queried in that language. 

The interface server 16 can also include search engine functionality to conduct 
a search of the Internet or other network for translation dictionaries, or a search of a 
dictionary database 22 as part of the present invention, described hereinafter. 

10 

Machine Translation using Specialized Dictionaries 

As shown in Figs. 1 through 3, the machine translation system of the present 
invention also includes a dictionary database 22 capable of storing dictionaries 24 for 

15 a number of core language pairs 24 A as well as for individual subject matter domains 
24B, sub-domains or sub-languages 24C, and user-specific domains 24D. For 
purposes herein, a core language pair refers to the combination of (1) the language of 
the communication to be translated, or source language, and (2) the language into 
which the communication is to be translated, or the target language. For example, 

20 standard core dictionaries 24A may include English-to-French, German-to- Japanese, 
Korean-to-English, and many other possible combinations of language pair 
dictionaries and translation engines. 

Each sub-language in the dictionary database is chosen to have a manageable 
25 size, predictable modes of expression and syntactic structures, and a well-understood 
context for disambiguation of homonyms, polysemic phrases, and specialized 
references. It should be noted that, in the machine translation field, the term "sub- 
language" usually refers to a recognized domain having a defined set of terms and 
patterns of language usage that characterize that domain. In the present invention, 
30 "sub-language" or "sub-domain" is used more loosely to refer to any set of terms and 
pattems of usage attributed to a field of usage, group of users, or even an individual 
user. A sub-language dictionary can thus be set up whenever a preferred set of terms 
and usages is identified. In addition to being set up by domain or field and sub- 
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domain or sub-field, sub-language dictionaries can be set up corresponding to socially 
determined usages or particular contexts, for example, or for a given type of 
correspondence, such as business or social, for example. 

5 As an example, within each language pair dictionary category, there may be 

domain and sub-domain dictionaries, such as investing and bonds, sports and soccer, 
home construction and plumbing, and music and classical, for example. Even further 
specified may be the user's own stored dictionary of terms or expressions and 
equivalent translated terms or expressions. Such a specific user dictionary may have 

10 value in a particular Intemet discussion group, a work group, a collaboration team 
group, or other small unit requiring particular translation dictionaries not otherwise 
facilitated. User dictionaries need not be domain or sub-domain specific, and can be 
created by the user within the realm of a language pair dictionary, as shown in Fig, 3. 
In one embodiment of the present invention, all dictionaries (domain, sub-domain, 

15 and user) can be stored in the dictionary database 22 accessible by the translation 
server 18. Each of the dictionaries stored in the dictionary database can be built and 
stored using a prescribed format for ease of manipulation by the machine translation 
server. 

20 Dictionary building, storing, and enhancement 

Sub-language dictionaries can be established and enhanced with dictionary- 
building tools currently used in machine translation, such as by using the ECS/MT^'^ 
system tools. The ECS/MT system allows the user to create a dictionary for a given 
25 language pair including technical terms for a chosen sub-language, and provides a 
rule editor, a dictionary maintenance utility, a translation module, a morphology 
module and a semantic preference component. 

The rule editor allows a linguist to create and modify morphological rules, 
30 phrase structure mles, and transfer rules for the sub-language. The dictionary 
maintenance utility allows creation and modification of lexical entries, including 
source entries, target entries, and source-to-target transfer entries in the dictionary. 
The translation module performs table-driven translation using linguistic tables. 
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analysis rules, transfer rules, and semantic preference entries that have been compiled 
into the dictionary. The morphology module applies rules to analyze morphologically 
complex words to determine uninflected forms for dictionary lookup of source lexical 
items and to generate morphologically complex words in the target language. The 
semantic preference component operates on preferred semantic relations, the 
assignment of semantic attributes to lexical items, and the accessibility and matching 
of these attributes for lexical disambiguation and selection of preferred translations. 

In one embodiment of the present invention, the dictionary building tools can 
be accessed over the Intemet using an Intemet browser. In this way, users who may 
be qualified to add or modify a particular dictionary in the database 22 can augment 
and improve the accuracy of interpretations for the benefit of those subsequently 
using that dictionary. In one embodiment, access to dictionaries is controlled by a 
central registration authority which limits access to authorized individuals. In another 
embodiment, an application programming interface (API) is provided to allow users 
to interface with the dictionaries regardless of the computer system, hardware, or 
software being employed. The API's can be provided with libraries of tools 
commonly known in the art for building dictionaries. In this way, a particular sub- 
language's capability is developed and cimiulated over time based upon the 
encountered words and identified preferences of actual users, user groups, domains, or 
fields. Thus, the dictionary building interface of the present invention can facilitate a 
peer-to-peer networking of specialized dictionary tool builders. 

In certain instances, the dictionary database of the present invention will not 
present a specific sub-domain dictionary for a given topic or subject. In such cases, in 
one embodiment of the invention, the system of the present invention can provide 
search agents as part of the interface server 16 to search the Intemet for such a 
dictionary, as shown generally at 30 in Fig. 1 . The search agents may be employed in 
a manner similar to that commonly known within the art. 

Sub-domain Dictionary Search 

Upon finding an appropriate dictionary 30 over the Intemet or other network 
for the given request, the present invention can invoke a software interface to allow 
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the machine translation server to communicate with and use the newly found 
dictionary, and to translate the desired text for delivery in accordance with the user's 
request. The interface can be a software routine, for example, which converts the 
format of the found dictionary 30 into a format which is readily understood by the 
5 translation server 1 8. 

The system of the present invention can also store the interactions of each user 
in a user file, which can be recalled each time the particular user accesses the system. 
The system may recognize the user through a cookie or cookies left on the user's 

10 computer system when accessing the system of the invention via the Intemet, for 
example, or the user may be recognized through the user providing identification 
information such as an e-mail address, account name, or password, for example. Such 
user information can be used to help predict which dictionary is most appropriate for 
the given user's request. The user file can be stored in a database accessible by the 

15 translation server. In one embodiment of the invention, user files can be stored in the 
dictionary database 22. 

Machine Translation Method 

The translation server 18 or engine may employ a conventional transfer-type 
20 system, an interlingua system, or other system of translation as is well known in the 
art of machine translation. By providing the machine translation server 18 with the 
most appropriate dictionary during the translation process as described herein, the 
method which is used to effectuate the machine translation is less consequential to the 
quality of the results. 

25 

Topic Detection and Context Recognition 

In one aspect, the present invention provides a real time translation system 
employing topic detection and context recognition. Traditionally, real-time 
translation has some formidable obstacles, probably the greatest of which is word- 
30 sense disambiguation, and the related problem of translation divergences. Domain- 
specific lexicons, despite their quality and number, can only offer limited 
improvements in real-time machine translation (MT) quality if they cannot be 
accessed when needed. One advantage of the present invention is the ability of the 
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system to automatically detect topic changes so that on-line domain-specific 
dictionaries can automatically be accessed in real-time. 

Topic detection and tracking (TDT) can involve several tasks including 
5 segmenting text into its constituent stories, identifying original topics, and matching 
topics to those already identified (tracking). The segmentation task can be 
approached by a variety of techniques including Hidden Markov Models. Under this 
approach, identifying topics in a text stream is similar to recognizing speech in an 
acoustic stream, whereby the hidden states are topics and the observations are words 

10 or sentences. An altemative to this approach is local context analysis (LCA). In this 
approach, a database of content-words is consulted for each sentence and associated 
concepts are retumed. Sentences are compared on the basis of common concepts, not 
shared words. The tracking task is similar to the standard routing and filtering tasks 
of information retrieval (IR). Each subsequent concept is "matched" to a previous 

1 5 concept using similarity measures. 

The present invention proposes a new technique for topic identification based 
on matching content words in the input stream to nodes in an ontological database. 
An ontological database is a hierarchically organized lexicon, much like a thesaurus. 

20 It contains lexical items classified according to various inter-lexical relationships such 
as hyponymy/hypemymy (i.e. sub-category/super-category), etonymy/holonymy 
(part/whole), and synonjmiy/antonymy. By way of example, the Wordnet ontology 
can be used for tasks relating to text categorization, machine translation and word- 
sense disambiguation. The present invention can employ ontologies for topic 

25 detection in real-time speech and text translation. 

Topic-detection has not previously been though of as a natural candidate for 
knowledge-based approaches. Ontologies (and other lexical knowledge-bases like the 
Cycorp™ CyC KB) are lexical hierarchies organized according to a specific set of 
30 principles. These principles include classifying words according to sub-classes and 
super-classes, not topics. Because superclasses do not stand in a topic-subtopic 
relationship to their subclasses, ontological classes are not considered good topic 
indicators. 
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The present invention does not use ontological categories directly as topic 
indicators. Rather, each content word in the input sequence is associated with a set of 
both hypemyms (the superclass of the word's class) and holonyms (the whole of 
5 which the word represents a part). The resulting set will be used to match a set of 
possible topics. Overlap in hypemym/holonym sets of subsequent words in an n- 
gram window will be used as input to a threshold indicator that selects the topic from 
a pre-defined list. 

10 The advantage the present system is that, unlike statistical topic detectors, the 

present invention needs very little context to make a topic selection. 

In one embodiment, the present invention matches each word (following stop- 
list processing) to a node in the ontological database. The output of this process is all 
1 5 the hypemym and holonym nodes associated with each word w. The resulting vector 

w^H(j,k) + w^O(j,k) comprises a context-set that is then be matched to a 
corresponding pre-defined topic tree. Each node in the topic tree is defined by a 
similar vector and the two are matched by the type of IR algorithm used in tracking. 
A set of common hypemym/holonym links in an n-gram window of input words can 
20 be used (instead of matching each single word), but window size would have o be 
minimized to increase processing speed. With this technique, a minimum of actual 
context is necessary before a topic is identified. 

Context recognition in real-time helps eliminate erroneous word choices by 
25 determining which connotation should be selected in the target language in real-time 
where multiple meanings of words exist in the source language. The word 
"reservation", for example, may mean Indian reservation, or restaurant reservation or 
a personal compimction type of reservation. The translation would be accurate only if 
the context was identified in advance in order to select the correct connotation in the 
30 target language's dictionary. Lack of context-sensitivity, through selection of 

appropriate domain-specific dictionaries with the right connotation, is therefore a 
major flaw in the current state-of-the art of machine translation. 



- 15- 



Accuracy of word choice for machine translation in Japanese, Korean, Arabic, 
Russian, Urdu and Farsi, as well as more common languages such as Spanish and 
Chinese, can rise dramatically. Rapid prototyping of new machine translation pairs for 
emergency use, such as Urdu-EngUsh, or Bosnian-English, can use customized 
5 dictionaries in accordance with the present invention for domain specific dialogues — 
or dialogues or news feeds or instant messaging in which the topic is changing rapidly 
and frequently — in real-time, whereas manual selection of dictionaries is not feasible 
in real-time, particularly where the target language to be translated is not understood 
by the person manually selecting a topic-specific dictionary. 

10 

As shown in Fig. 2, in the context of a speech to speech translation, incoming 
speech 101 is converted to text by speech-to-text converter 103, which then forwards 
the text to translation engine 105. After the engine has formatted the text to be 
translated, the present invention's context recognition tools enable the server to (1) 

15 identify the subject matter and automatically select the correct online dictionary so the 
translation is context-sensitive, and (2) detect the correct language using a statistical 
algorithm to bring in the right language translation engine corresponding with the 
source language. Specific lexicons can include telecom, health care and oil and gas 
industries, for example. In one embodiment, the present invention rapidly changes 

20 dictionaries on the fly, without user-assisted menu-driven functions. The language 
tools needed to achieve such a dramatic increase in accuracy include translation 
memory, customized dictionaries, svmimarization and caching memory for enhancing 
instant messaging. 

25 As shown in Fig. 2, as part of translation gateway 12, translation engine 105 is 

in communication with topic detection subsystem 106 and specifically a lexicon 
switching component 107, which is capable of parsing the text through the 
appropriate dictionary 24 from database 22 based on topic detection as determined by 
topic lexicon matching component 109. Topic lexicon matching component 109 is 

30 capable of matching the topic from the text input with an established lexicon using 
one or both of statistical topic detection or ontological topic detection. A statistical 
topic detection component 111 and an ontological topic detection component 113 are 
provided in communication with topic lexicon matching component 109. It will be 
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appreciated that the components of translation server 18 and topic detection 
subsystem 106 can be software (e.g., Java™ programs) or hardware elements (e.g., 
ASICs), or a combination of both software and hardware. The topic detection 
methods occur as previously discussed. Once the topic is determined, the dictionary 
5 selection program can be activated and a domain-specific dictionary will be selected 
by the lexicon matching component program. The text is then translated and passed 
to text-to-speech converter 117, whereupon the speech 119 can be spoken using 
appropriately outfitted devices, such as a cellular telephone, for example. 

10 It will be appreciated that the translation in Fig. 2 can occur without speech 

inputs and outputs. For example, the text inputs can be obtained via an e-mail 
message, instant message, SMS message or the like, and outputted in the same 
manner in which it arrived. Also, while the diagram in Fig. 2 shows one-way data 
flow, the present invention can operate to provide two-way data flow. 

15 

The present invention can be employed in the creation and use of in-house 
access programs and integration systems. In one embodiment, an off-the-shelf 
speech-to-text system can be integrated with the translation component of the present 
invention. 

20 

Thus, in one embodiment, the present invention provides a context detection 
system whose output can be ported to a topic database for topic selection. The topic 
selection can then be input into a program, which switches domain-specific 
dictionaries in real time. The domain-specific dictionary is ported seamlessly into the 
25 translation engine and a corresponding domain-specific dictionary in the target 
language is then chosen. 



The present invention can be implemented using a plurality of computer 
programs working sequentially in the following way: (1) the input sequence is 
30 processed to remove stop-words (2) each element in the output sequence is matched 
to nodes in an ontology (or fed to a clustering algorithm in the stochastic topic 
detection method of the present invention) (3) the resulting list of nodes (either for 
each word or list of common nodes fi*om an n-gram window) are compared against a 
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Topic Database (4) the topic-activation threshold is calculated (5) a topic is selected, 
and (6) the lexicon switcher switches lexicons. The ontology or knowledge-base is 
accessed by a program in accordance with the present invention that matches content 
words from the input data, producing a term vector as output. Another program uses 
5 the output string as input to the program that manages topic-association thresholds. 

Dictionary Organization and Selection 

As shown in Figs. 1 and 3, core language dictionaries and a plurality of sub- 
10 language dictionaries are maintained in the system's dictionary database. The system 
can provide dictionary selection based upon analysis of the text to be translated and 
other factors, such as the user's prior uses of particular dictionaries in the system. For 
example, if a particular user seeks to run an Intemet search in a foreign language for 
South American natural gas power plants and seeks related news articles in Spanish, 
15 the user is truly seeking two translations in accordance with the present invention. 

First, the user's keywords must be translated and a search conducted on the translated 
keywords. Then, the returned web sites and web pages must be translated from 
Spanish to English so the user can read the articles. The presentation of the request 
and the retumed web page is done in accordance with the methods described 
20 elsewhere herein. The selection of the appropriate dictionary to use is critical to the 
accuracy and ultimate success of the web search or other request made by the user. In 
one embodiment, the present invention provides a domain specific lexicon builder 
component which can build new dictionaries and enhance previously established ones 
through manual input and categorization of terms based on a defined domain. 

25 

In the present example, the user's keywords in English "South American 
natural gas power plants" along with the target language of "Spanish" would be used 
to locate the most appropriate sub-domain dictionary in the dictionary database. First, 
the system of the present invention would locate all of the sub-domain dictionaries 
30 within the core language pair of English-Spanish. Then, the contextual dictionary 

locator component would search variations of the phrase "South American natural gas 
power plants" and through several iterations and variations on the inputted text, the 
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sub-domain dictionary determined to provide the best fit would be accessed to create 
the Spanish translation. At this point, the search on the Intemet would be initiated. 

When the Intemet search results are returned, the user may desire one or more 
5 of the returned references to be translated back into English. In order to do so most 
accurately, the system of the present invention may incorporate a dictionary or sub- 
language dictionary within the Spanish-to-English language pair, such as the Spanish- 
English energy industry dictionary, or Spanish-English natural gas dictionary, for 
example. In one embodiment of the invention, the user may be provided with a 

10 choice of two or more sub-language dictionaries contained in the dictionary database 
of the present invention. In a further embodiment of the invention, where the 
dictionary database does not contain a relevant sub-language dictionary, the core 
language pair dictionary is employed. Alternatively, the system of the present 
invention may search the Intemet for an appropriate substitute dictionary to be 

15 employed to give the greatest contextual accuracy to the translation, as previously 

described. It will be appreciated that the user may at any time request that an Intemet 
search be performed in order to discover a more contextually proper sub-language 
dictionary, or in order to invoke a user-known dictionary accessible on the Intemet. 

20 The dictionary selection process in the example above may occur as a result of 

the keywords provided by the user. In the case where the user does not provide 
keywords, analysis of the text to be translated can be done by words, phrases, proper 
names, geographic location, or other method of inferring an appropriate sub- 
dictionary based upon the text or context of the given text to be translated. The abihty 

25 to determine an appropriate dictionary through context recognition in order to 

translate text is imperative to any requirement for highly accurate translations. By 
actively recognizing the context of the text to be translated, the system of the present 
invention removes the need for the user to select a sub-domain dictionary. In some 
cases, the user may know which sub-domain or specialized dictionary would be most 

30 appropriate, and in such cases the present invention allows the user to so designate. 
However, in many other cases, the user will be requesting translation of text from a 
language the user does not xmderstand into a language the user does understand. In 
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such cases, the user is severely disadvantaged in trying to select a specialized 
dictionary, whereupon the present invention becomes quite valuable to the user. 

Incorporation of external dictionaries 

5 

As shown in Fig. 7, the system of the present invention can also provide 
functionality to assist in compensating owners of external specialized or other 
translation dictionaries. For example, when the system locates a relevant dictionary 
on the hitemet upon searching, as at 80, it identifies the URL (vmiversal resource 

10 locator) or address where the dictionary is found, as at 82. This URL can be stored by 
the system for future analysis and information gathering. Next, a system or network 
operator in connection with the present system can be notified as at 84 regarding the 
URL of the found dictionary and any further collected information about the 
dictionary. The system or the system operator can then determine whether the 

15 dictionary is available freely to the public, or whether it is proprietary and not subject 
to free use, as at 86. If the dictionary is considered in the public domain, the system 
can conduct the translation of the desired text using the system interface and the 
translation server, as at 88. If the dictionary is proprietary, the system of the present 
invention can generate a license agreement and forward it to the owner of the 

20 dictionary as at 90, as discovered through conventional means. Once an agreement is 
in place, as at 92, the system of the present invention can proceed with translations 
using the dictionary, as previously described. 

The system of the present invention can also be used to provide compensation 
25 terms as part of any licensed dictionary. Such compensation terms may be 

determined based upon frequency of need for the dictionary, accuracy of results using 
the dictionary, and other factors. Further, the system of the present invention can 
employ methods of electronic payment as known in the art to compensate dictionary 
owners electronically. 

30 
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Input Data reception 

The system of the present invention is designed to receive requests in many 
formats and of many types. In one embodiment, the receiving interface receives input 
text as electronic machine-readable text over a communications line, or as page image 
5 data via a fax/modem board or page scanner. The receiving interface is operated in a 
computer server along with a recognition module for converting any page image data 
to electronic text. The recognition module scans and recognizes designations of the 
input text for determining the selections of the source/target languages and sub- 
languages applicable to the input text. In the case of electronic text, the input text 

10 may be introduced by means of a disk file, by downloading an electronic file, or by 
online user-system interaction. In a preferred embodiment, the input is interactive, 
whereby the user is prompted for information conceming user identity, sub-language 
preferences, source and target languages, and other items to facilitate the translation. 
Inferencing algorithms may be used to assess the user and textual information and 

15 determine the applicable sub-language dictionary or dictionaries. 

Fig. 4 shows an example input screen for use in connection with the present 
invention. As shown therein, the user may be prompted to provide the source 
language 1 10, target language 1 12, and the text to be translated 122. The user may 
optionally be prompted to provide a selection of a particular dictionary 1 14 within the 
dictionary database, the URL of a known translation dictionary on the Intemet 116, 
keyword search terms 1 18 for an Intemet search, the URL of a web page to be 
translated 120, if desired, and the e-mail address 124 of an individual who is to 
receive a translation of the entered text. The items represented in Fig. 5 are not 
exhaustive of all of the items which may appear on a user's browser for input into the 
system of the present invention and are provided by way of example. Also, the 
method by which the user can input the information collected can vary, and can 
include open text boxes and drop-down menus, for example. Various action buttons 
126 can also be provided which enable pre-defined search, translate, and transfer 
functions upon user input, such as a mouse click, for example, which is widely known 
in the art. 



20 
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A user's remote device may have a similar interface to the extent there is 
available screen space. Otherwise, the remote device may have a portion of the 
selection options shown in Fig. 4. In one embodiment, a user's remote (e.g., wireless) 
5 device may include action buttons and/or selection icons for SMS messaging 1 15 or 
instant messaging 1 17, as shown in dashed lines. 

Input requests can include (1) translating and transferring text from the user in 
the user's language (source language) to the user's desired recipient in the recipient's 

10 language (target language); (2) translating and transferring the text of a given web 
page in a source language to the user in the user's language (target language); (3) 
translating a document, short message service (SMS) message or e-mail; and (4) 
searching for information on the Internet where the search is begun using keywords in 
a first language and translated into a second language, whereupon the search can be 

15 conducted effectively in the second language. Each desired function can be executed 
in accordance with the methods previously described in connection with Fig. 1. 

The system of the present invention can be used for many applications 
requiring or desiring highly accurate language translation functionality. As shown in 

20 Fig. 5, for example, the system of the present invention can be used to translate and 
transfer communications in accordance with a user's preferences. In this example, the 
system accepts as inputs (step 130) the source and target languages as designated by 
the user, as well as the text of the communication to be translated. The input text can 
be an electronic file, text entered by the user through the browser interface, or other 

25 form of electronic text as previously described. In one embodiment of the present 
invention, the system can recognize the source language of the user automatically 
through character recognition techniques. At step 132, the system can determine 
whether the user has previously used or stored a dictionary within the system. This 
may be done through the use of a cookie or other method whereby the system can 

30 recognize the identity of the user accessing the system through their Internet browser. 
This may also be done by the direct input of a user on the graphical user interface 
available upon accessing the system. If the user has previously used or stored a 
specialized dictionary, it can be offered to the user as an optional dictionary to be used 
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in translating the user's communication, as at 134. In one embodiment of the 
invention, the system of the present invention may give added consideration to the 
particular previously used or stored speciaHzed dictionary or dictionaries in 
determining the appropriate specialized dictionary to employ for the user's particular 
5 request. This may result in a quicker determination by the system of the specialized 
dictionary to employ, especially as the system of the present invention adds more and 
more specialized dictionaries. 

If the user has not previously used or stored a dictionary, or if the previously 
10 used or stored dictionary is determined not to be appropriate as at step 136, the 
context of the inputted text is analyzed, as at step 138. Based on the contextual 
analysis of the text to be translated, the system of the present invention checks the 
dictionary database to determine whether there is an appropriate domain or sub- 
domain dictionary for the given core language pair and for the context determined to 
15 best suit the translation goal of the user, as at 140. If so, the dictionary is selected as 
at 142 and deployed as at 150, before the translated text is ultimately transferred as at 
1 52 in accordance with the user's original request. 

If the appropriate specialized dictionary is determined not to be available 
20 within the dictionary database, the system of the present invention can deploy search 
agents as at 144 to search the Internet for the appropriate specialized dictionary. In 
one embodiment of the invention, if the dictionary database does not contain the 
appropriate specialized dictionary, the system of the present invention can translate 
the desired communication according to a core language pair dictionary available 
25 within the dictionary database. 

If the search agents locate a suitable specialized dictionary for the given 
commuiucation context, the system of the present invention can then provide an 
appropriate interface to allow the translation server in connection with the present 
30 invention to translate the desired coiimiunication using the located specialized 

dictionary as at 150. If the located dictionary is found to be satisfactory, such as by 
repeated use over time or by the measured quality of translation results (which can be 
measured by human translators), the system of the present invention can act to 
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institute licensing proceedings for the compensation and/or license of the located 
dictionary from its discovered owner, as described hereinafter. 



Multilingual Searching 

5 

The progression of processing that occurs during a multilingual search for web 
pages in accordance with one embodiment of the present invention can occur as 
follows, with reference to Fig. 1 . First, the end user device 20 can transmit keywords 
via the Intemet 14 to the interface server 16. The transmitted keywords are to be used 

10 for performing a search for web pages containing and/or relating to the keywords. 
The end user device 20 may also transmit to the interface server 16 an identifier of a 
target language in which the user desires to search. The identifier of the target 
language may specify a single target language or multiple target languages. Next, the 
interface server 16 passes the user input kejwords and the identifier of the target 

15 language to the translation server 18. The translation server 18 is capable of 

converting text from one language to another language. The translation server 18 
retums the translated keywords to the interface server 16. As mentioned above, 
communications between the interface server and the translation server may occur via 
a direct network connection or via the Intemet. 

20 

Next, the interface server 16 initiates a query of the search engine database 25 
for the locations of web pages which contain and/or relate to the translated keywords. 
Altematively, the interface server 16 may pass the translated keywords to a search 
engine of another service provider (not shown), which may initiate the query of the 

25 search engine database 25. Next, the search engine database 25 retums the results of 
the query to the interface server 16. The search results may include URLs, and titles, 
abstracts and/or svmimaries of web pages identified in the search engine database 25 
that contain and/or relate to the translated keywords. As is well known in the art, the 
search results may also include other types of information about each identified web 

30 page, such as a creation date, a relevancy score, a file size, etc. Thus, the search 
results may contain various textual portions written in the target language, making 
fiirther translation desirable prior to presenting the search results to the end user 
device 20. 
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Next, the interface server 16 passes the search results to the translation server 
18 for translation to the user's native language. More specifically, the interface server 
16 may pass textual portions of the search results to the translation server 18 for 
5 translation to the user's native language. Also, the interface server 16 may pass URLs 
corresponding to web pages identified in the search results to the translation server 18. 
The translation server 1 8 may modify URLs so that retrieval of web pages may be 
directed through the interface server 16, rather than directly through the Internet. 
Those skilled in the art will appreciate that modification of URLs may be performed 

10 at the interface server 16 or at another web server (not shown), instead of at the 

translation server 18. Furthermore, those skilled in the art should recognize that the 
scope of the present invention is not meant to be limited by the described 
configuration, in which interface and translation functions are separated between the 
exemplary interface server 16 and the exemplary translation server 18. Interface and 

1 5 translation functions may be included within a single gateway web server, or may be 
divided between any number of inter-connected web servers. 

Next, the translation server 18 returns the translated search results to the 
interface server 16, where they are assembled into a translated results page. The 

20 interface server 16 then passes the translated results page to the end user's device, via 
the Internet. The translated results page may include titles, abstracts, summaries and 
other information that has been translated into the user's native language, relating to 
identified web pages. Accordingly, the present invention provides the ability for the 
user to enter keywords in the user's native language and direct that a search be 

25 performed on those ke>words in another language, and to receive the search results 
information sximmarizing or identifying the xmcovered web pages in the user's native 
language. 

Fig. 6 shows a block diagram depicting another method of performing a 
30 keyword search in accordance with the present invention. As shown in Fig. 6, once 
the user has input source and target languages and the keywords to be used in 
searching (step 160), the system can determine whether the user has also pre-selected 
a dictionary to be used in translating the keywords or phrase (step 162). If so, the text 
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of the keywords is transferred to the translation server as at 164, and the text is 
translated accordingly, as at 166. If the user has not pre-selected a dictionary, the 
system through the translation server analyzes the inputted text to determine which 
dictionary would be best suited to conduct the translation, as at 168. If a suitable 
5 dictionary is available within the database (determined at 170), that dictionary is 
selected as at 172 and translation is conducted as at 166. If no dictionary in the 
database is determined to be appropriate, the system of the present invention can 
perform an Internet search as at 174 using search engine capabilities of the interface 
server. If a suitable dictionary is found over the Intemet, the interface software of the 
10 system then allows for the translation server to translate the keyword or key phrase 
text using the found dictionary as at 166. 

Upon performing a keyword search of the Intemet, as at 176, using the search 
engine (25 of Fig. 1), and receiving the search results as at 178, the system of the 
15 present invention can then translate the results back into the source language as at 182 
using a dictionary selected in a similar manner to the selection of the first dictionary 
(step 180). The translated results can then be transmitted to the requesting user as at 
184. 

Multilingual E-mail 

20 The gateway 12 in accordance with the present invention can also be 

configured for translating and routing e-mail communications (i.e., e-mail messages) 
between various network elements. The terms "e-mail communication" and "e-mail 
message" are used synonymously herein. In one embodiment of the present 
invention, the gateway can be configured to be compatible with existing e-mail client 

25 and server software. Therefore, as will be appreciated by one of ordinary skill in the 
art, a first level of interface for the gateway can be a public SMTP Server. As is 
generally known within the art, an SMTP server is an integral part of an e-mail 
system. An SMTP server is responsible for routing e-mail messages between e-mail 
systems. The public gateway SMTP server is designed to accept e-mail messages 

30 fi-om a DNS (domain name server) server and to pass those e-mail messages to a 

gateway Mail Agent for processing and routing. The combination of the SMTP server 
and the Mail Agent represents a specially configured gateway interface server 16. 
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The gateway Mail Agent may be operable to extract textual portions from an e-mail 
message and to send those extracted textual portions to the translation server 18. 
Alternately, functionality for extracting textual portions from an e-mail message may 
be included in the translation server 18. In one embodiment of the invention, the 
5 translation server 1 8 may be comprised of one or more machine translation engines. 

In an exemplary embodiment, the translated e-mail services of the present . 
invention may be integrated with an existing e-mail system, such that an interface 
server 16 is used as a gateway into the existing e-mail system. For example, if all 

10 users of an existing e-mail system are to be offered translating service, an exemplary 
embodiment may encapsulate the existing e-mail system. In such a configuration, 
those skilled in the art will appreciate that providing users with access to the interface 
server 16 may be accomplished by updating a DNS server to point SMTP domain 
name(s) to the gateway SMTP server. Altemately, if the goal is to enable a premium 

15 translating service for providing translating services to only selected users, an 

exemplary embodiment may be configured to supplement existing e-mail systems. To 
supplement existing e-mail systems, users may be given the option to update their 
client software to point to the domain name assigned to the gateway SMTP server. 
For example, an ISP may want to offer translated e-mail as a premixmi service for 

20 users. If a pre-existing SMTP server is located at smtp.myisp.com, the ISP may 
define a new domain name, such as newsmtp.myisp.com, corresponding to the 
gateway SMTP server and then direct all premium users to the new address. Of 
course, the reverse approach is also possible, wherein the preexisting SMTP server is 
assigned a new SMTP domain name. 

25 

Key contributing factors to implementing an embodiment of the present 
invention wherein translated e-mail services are offered via gateway into existing 
e-mail systems may be: a desire to maintain existing e-mail infrastructure; the ability 
to offer mixed services, i.e. "traditional" and "translated" e-mail; a desire to maintain 
30 existing intemal client base software; and a desire to maintain external access 

(i.e., addresses). In cases where an existing e-mail infrastmctiu-e is tightly integrated 
with other services or policies, a gateway configuration such as provided by the 
present invention may add the desired translation capabilities while maintaining the 
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existing e-mail infrastructure. A gateway configuration may also prove critical for 
speed of implementation and cost of services. 

Some e-mail installations may desire to maintain their existing client base 
5 software, such as e-mail client utility, address books and history folders, hi addition, 
client settings may be difficult to update. As such, the client software may be 
seamlessly integrated into a gateway configuration of the present invention. For 
example, the server side DNS may be updated to point to new IP address(es) assigned 
to gateway SMTP server(s). Also, the gateway SMTP servers may be assigned to the 

10 IP addresses of pre-existing SMTP servers, which in tum may be assigned new 

addresses. Another important factor considered by the present invention is the desire 
to maintain the external address space assigned to the existing internal users. For 
example, if the users of the system have mailboxes on myisp.com, such as 
someuser@myisp.com, it may be desirable and practical to maintain this schema. A 

15 gateway configuration allows extemal address space to be easily maintained. 

From a reading of the description above pertaining to the disclosed 
embodiments of the present invention, modifications and variations thereto may 
become apparent to those skilled in the art. For instance, the gateway of the present 

20 invention may also be adapted to interact with "chat room" application programs to 
multilingual "chatting" over a distributed network. Also, the translation component 
of the present invention may be adapted to simultaneously or individually handle all 
types of communications described herein. Other alternatives and variations may also 
become apparent to those of ordinary skill in the art upon a close examination of this 

25 specification in view of the drawings. 

Multilingual SMS 

Short message service (SMS) is a globally accepted wireless service that 
enables mobile subscribers to transmit alphaniuneric (e.g., text) messages using a 
30 wireless handset and/or cellular telephone. Transmissions can occur between mobile 
subscribers and extemal systems such as electronic mail, paging, and voice-mail 
systems. The messages are generally no more than 140-160 characters in length. 
Similar to e-mail, short messages are stored and forwarded at SMS centers (SMSCs), 
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which means messages can be retrieved later if the recipient is not immediately 
available to receive them. SMS messages travel to the cell phone over the system's 
control channel, w^hich is separate and apart from the voice channel. The North 
American protocol for passing cellular subscriber information from one carrier to 
5 another is Intemational Standard 41, or IS-41, which supports short messages. 



Short codes can be used as part of an SMS system. Essentially a direct 
response medium, short codes let people send SMS messages simply by dialing a 
four, five, or six-digit number, rather than the 10-digit numbers used in person-to- 

10 person text-messaging. Short codes are easier to remember and ejisier to type than 
their longer coimterparts, and let users send a short, easy code in response to a 
promotion makes it more likely that they will engage with the campaign. 
These numbers are of interest to carriers because they can be billed at varying rates. 
They are of interest to marketers because they represent an easy way for consumers to 

1 5 use their mobile phones to respond to promotions and to ask for content, including 
call-to-action campaigns in print ads or on billboards, or text voting for TV viewers. 

Fig. 8 shows an example network architecture for an IS-41 SMSC deployment 
handling multiple input sources, including a voice-mail system 201, Web-based 

20 messaging 203, e-mail integration 205, and other external short message entities 207. 
It will be appreciated that a ftinctionally similar SMS architecture could also be 
employed in other wireless networks, such as a global system for mobile 
communications (GSM) wireless network. The signal transfer point 213 allows for 
conmiunication with the wireless network elements such as the home location register 

25 211 and mobile switching center 215. 

As shown in Fig. 8, the SMSC 200 acts as a store-and-forward system for 
short messages. The SMSC 200 is a combination of hardware and software 
responsible for the relaying and storing and forwarding of a short message between 
30 any of the short message entities 201, 203, 205, 207 and mobile device 210. With 

SMS, an active mobile handset 210 is able to receive or submit a short message at any 
time via air interface 220, independent of whether a voice or data call is in progress 
(in some implementations, this may depend on the mobile switching center or SMSC 
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capabilities). SMS also guarantees delivery of the short message by the network. 
Temporary failures due to unavailable receiving stations are identified, and the short 
message is stored in the SMSC until the destination device becomes available. 

5 Devices 201, 203, 205 and 207 can receive or send short messages. It will be 

appreciated that the short message entity (SME) may be located in the fixed network, 
a mobile device, or another service center. In a typical SMS environment, the voice 
mail system 201 is responsible for receiving, storing, and playing voice messages 
intended for a subscriber that was busy or not available to take a voice call. It is also 

10 responsible for sending voice-mail notifications for those subscribers to the SMSC 

200. World Wide Web 203 interconnections are also supported for the submission of 
messages and notifications. SMS also provides the ability to deliver e-mail 
notifications and to support two-way e-mail, using an SMS-compliant terminal. The 
SMSC must support interconnection to e-mail servers (e.g., 205) acting as message 

15 input/output mechanisms. 

The signal transfer point 213 is a network element typically available on IN 
deployments that allows IS-41 interconnections over signaling system 7 (SS7) links 
with multiple network elements. SS7 is a telecommunications industry standard 

20 signaling protocol. SMS service makes use of the SS7 mobile application part 

(MAP), which defines the methods and mechanisms of signaling communication in 
mobile or wireless networks. The MAP protocol uses the transaction capabilities 
application part (TCAP) component of the SS7 protocol, and both North American 
and intemational standards bodies have defined a MAP layer using the services of the 

25 SS7 TCAP component. 

The home location register (HLR) 21 1 is a database platform for permanently 
storing and managing mobile service subscriptions, user profiles and user location 
information for users belonging to the same network as the HLR. A visitor location 
30 register (VLR) is a database element used to temporarily store information about 
subscribers who are currently roaming in the area serviced by that VLR. This 
information is needed by the mobile switching center (MSC) 215 to service visiting 
subscribers. The VLR can belong to the subscriber's home network or to a non-home 
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network. In many cases, VLR databases are integrated within mobile switching 
center network elements. The HLR and VLR store information for properly routing 
voice calls or data commxmications to the mobile user. This can include international 
mobile station identification (IMSI), mobile identification number (MIN), mobile 
directory number (MDN), and mobile station international ISDN number (MSISDN), 
as well as VLR and mobile switching center identification information associated 
with the user. 

The mobile switching center 215 performs the switching functions of the 
system and controls calls to and from other telephone and data systems. The MSG 
delivers the short message to the identified user through the proper base station. The 
air interface 220 is defined based on the given wireless technologies (e.g., GSM, 
TDMA, and CDMA), which specify how the voice or data signals are transferred 
from the MSG to the handset and back. These technologies also specify the 
utilization of transmission frequencies, considering the available bandwidth and the 
system's capacity constraints. 

The HLR 211 provides the routing information for the indicated user, as 
prompted by the SMSC 200. If the destination station was not available when the 
message delivery was attempted, the HLR 211 informs the SMSG 200 that the station 
is now recognized by the mobile network to be accessible, and thus the message can 
be delivered. 

In providing an automatic translation of SMS messages, the present invention 
can parse the SMS message, filter abbreviations, interpret the delivered message, 
screen the call identification information and establish an appropriate language pair 
for translation. 

A block diagram of a communication and translation system 300 according to 
one embodiment of the present invention is shown in Fig. 9. As shown therein, 
mobile devices 210 receive phone calls through a voice communication channel 232 
and hypermedia information from remote server devices through broad-band 234 and 
narrow-band 236 (e.g. SMS) data communication channels which can include 
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wireless gateway 238 and SMSC 200. Mobile devices can be devices taken from the 
group of devices including mobile phones, personal digital assistants and/or palm 
sized computing devices with voice and data transmission and/or reception 
capabilities. Hypermedia can include media from the group including Extensible 
5 Markup Language (XML) documents. Hyper Text Markup Language (HTML) 

docimients. Compact Hypertext Transport Protocol (cHTML) documents. Handheld 
Device Markup Language (HDML) documents. Wireless Markup Language (WML) 
documents, or other similar data types. 

10 Mobile devices 210 are provided with a display, user interface and appropriate 

software stored within memory for processing received hypermedia information, and 
can be coupled to server 238 through wireless network 220. Mobile devices 210 can 
also be provided with speakers and microphones for transmitting and receiving 
audible commimications. Wireless network 220 can be one of the wireless 

15 communication networks known in the art, such as, for example, a cellular digital 
packet data (CDPD) network, a GSM network, IS-41 network. Code Division 
Multiple Access (CDMA) network, or Time Division Multiple Access (TDMA). 
Wireless network 220 can use various communication protocols such as, for example. 
Wireless Access Protocol (WAP) or Handheld Device Transport Protocol (HDTP). 

20 Wireless gateway 238 is fiirther coupled to a separate network 240 and network 240 is 
coupled to translation gateway 12 and, in the embodiment of Fig. 9, a networked 
server farm 250. 

The mobile device user can access the voice communication chemnel 232 once 
25 the device is recognized by the network 220, such as through the exchange of 

identification information between the mobile device and network 220. Device and/or 
user identification information can be stored in the memory of the device and 
transmitted automatically when the user attempts to access the network, as is known 
in the art. 

30 

Translation gateway 12 includes the capabilities described above and an 
appropriate speech-to-text converter 103 can be provided at the voice conmiunication 
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channel interface to the translation gateway 12. Server farm 250 can provide access 
to hypermedia information including information to be sent to mobile devices 210. 

Both wideband and narrowband data communication channels can receive 
5 from and deliver data to mobile devices. 

A mobile device user desiring to send a translated message to another user 
according to the present invention can do so by voice or text. If doing so by voice, 
user first establishes a voice channel as shown at 232, Once a voice channel is 

10 established, speech is received by the speech-to-text converter and processing occurs 
as described above. If doing so by text, whether by broadband or narrowband 
communication, the user submits the text through device 210 and hits the "submit" or 
other appropriate button on the device. If the user is pre-selecting the language pair 
for translation, the user can so specify as described in connection with the user 

15 interface in Fig. 4. If the user's text or speech is to be analyzed for topic detection 
and/or context recognition, similar procedures to those defined earlier will occur at 
translation server 18. 

Thus, it can be seen that users of mobile devices 210 in accordance with the 
20 present invention can access language translation services without the significant 

hardware or software modifications that might be required if the translation services 
were executed by the device itself Additionally, since the software performing 
translation processing is resident on an accessible remote server device with superior 
processing speed and large storage capacity, the user of the device can be provided 
25 with the functionality and resources associated with a full featured speech translation 
application, including access to large language dictionaries, selectable language 
dictionaries for multiple languages and user specific files (e.g. voice templates and 
user customized dictionaries and lists). It will be appreciated that the present 
invention is operable regardless of device or device operating system. For example, 
30 mobile devices 210 can operate using various operating systems such as Java 2 Micro 
Edition (J2ME™)^ Binary Runtime Environment for Wireless (BREW^m) by 
Qualcomm™, Symbian™, Linux™^ Palm™, .Net, and the RIM Blackberry™ 
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operating system. 



In one embodiment, the user's source language and the intended recipient's 
target language are automatically determined based on information detected in the 
5 message sending process. Respective sending and receiving party identification 

information can be detected in a variety of ways. Detection can occur automatically 
based on the device used or based on the sent message. For example, the sender's 
device can be recognized by the network 220 and an associated cellular telephone 
number can be detected and compared to a previously established database of 

10 telephone numbers. Since the beginning portion of the telephone number typically 
includes an indication of the country or area code associated with the device's phone 
number, the present invention can use this code to associate a language dictionary 
with the intended translation. For example, the user's telephone may be registered in 
the United States with a "202" area code, which would mean the user's telephone is 

1 5 associated with the Washington, DC region of the United States. Thus, the user's 
language would be pre-established as English. 

This method can be employed based on the recipient's phone information as 
well. For example, if the user intends to send £ui SMS message to Japan, the user 

20 would employ the coimtry code "81". Once this information is detected, the present 
invention can compare the identification information with previously stored 
identification information fi-om translation database 24, and can then select the 
English- Japanese translation dictionary to translate the user's message fi-om English 
to Japanese automatically and in real-time. The text of the message can also be 

25 analyzed for topic detection and context recognition as described above to obtain the 
appropriate contextual English- Japanese dictionary for translation, as described 
above. If the present invention detects a topic change within the SMS message, 
multiple dictionaries may be employed "on the fly" to provide the most accurate 
complete message translation fi-om English to Japanese, in this example. 

30 

In another embodiment, the present invention can detect the international 
direct dialing prefix used by the sender. For example, if the sender uses the 
intemational dialing prefix "01 1", the system can detect that the user is dialing from 
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the United States and can again choose English as a default source language for the 
impending translation. In still a further embodiment, the sender's or the recipient's 
language for translation can be determined based on cither's mobile subscriber 
integrated services digital network (MSISDN) nimiber, intemational mobile station 
5 identifier (IMSI) nxmiber, electronic mail (email) address, or Internet protocol (IP) 
address. Such items may be pre-associated with a given language to assist in the 
automatic determination of which language pair to employ for a given SMS message 
to be translated. 

The invention may be embodied in other specific forms without departing 
from the spirit or essential characteristics thereof. The present embodiments are 
therefore to be considered in all respects as illustrative and not restrictive, the scope of 
the invention being indicated by the claims of the application rather than by the 
foregoing description, and all changes which come within the meaning and range of 
equivalency of the claims are therefore intended to be embraced therein. 

What is claimed and desired to be secured by Letters Patent is: 
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